tea cup
Lost in Translation: Latent Concept Misalignment in Text-to-Image Diffusion Models
Zhao, Juntu, Deng, Junyu, Ye, Yixin, Li, Chongxuan, Deng, Zhijie, Wang, Dequan
Advancements in text-to-image diffusion models have broadened extensive downstream practical applications, but such models often encounter misalignment issues between text and image. Taking the generation of a combination of two disentangled concepts as an example, say given the prompt "a tea cup of iced coke", existing models usually generate a glass cup of iced coke because the iced coke usually co-occurs with the glass cup instead of the tea one during model training. The root of such misalignment is attributed to the confusion in the latent semantic space of text-to-image diffusion models, and hence we refer to the "a tea cup of iced coke" phenomenon as Latent Concept Misalignment (LC-Mis). We leverage large language models (LLMs) to thoroughly investigate the scope of LC-Mis, and develop an automated pipeline for aligning the latent semantics of diffusion models to text prompts. Empirical assessments confirm the effectiveness of our approach, substantially reducing LC-Mis errors and enhancing the robustness and versatility of text-to-image diffusion models. Our code and dataset have been available online for reference.
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > New York (0.04)
- North America > United States > California (0.04)
- Asia > China > Shandong Province (0.04)
Faces for kettles: data collection industry flourishes as China pursues AI ambitions
At the front of the line, a woman stands in front of a camera zip-tied to a tripod. She holds a photograph of her head with the eyes and the nose cut out in front of her face and slowly rotates side to side. Villagers waiting their turn take a numbered ticket. Some of them say it's the third or fourth time they've come to do this sort of work. The project, run out of a sleepy courtyard village house adorned with posters of former China leader Mao Zedong, is collecting material that could train artificial intelligence (AI) software to distinguish between real facial features and still images.
- Information Technology > Artificial Intelligence > Vision > Face Recognition (0.37)
- Information Technology > Communications > Social Media > Crowdsourcing (0.33)